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ABSTRACT 

Traditionally, matching test formats have been 
avoided in favor of multiple-choice items for several reasons, 
including item analysis, properties and chance performance 
characteristics. In the light of research 'on test format and anxiety, 
this study ppstulates that, if a' matching test could assess knowledge 
for a given topic as effectively as an analogous multiple-choice 
test, yet present a less threatening, less anxiety-provoking 
situation, then the matching format should be utilized for testings 
Two experiments measured the comparative effectiveness of each format 
for assessing student recall capabilities and for reducing test 
anxiety . Sixty-four students from Los Angeles atea high schools wer,e 
first administered 12 premise/response matching pair items and 12 
multiple-choice items to assess prior knowledge. Test anxiety wiasj * 
measures with an experimental 'Test Anxiety Inventory, and test J 
preference was measured with a three-item questionnaire. The second 
experiment determined generalizability , testing recall of novel or ^ 
recently encoded material. Test subjects consistently favored the 
matching tests, scored equally high on them, and experienced 
significantly- less debilitating test anxiety. The reduction of 
anxiety was possibly due to successful test-taking strategies and 
positive self-evaluations during testing. (CM) 
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V- INTRODUCTION 



Research since the early 1950's has consistently verified the sig- 
nificantly negative relationship between test anxiety and academic . 
achievement (Shaha & Wittrock, Note^l).' S. B. Sarason and Mandler (1952) 
were among the first to uncover a significant correlation between test 
scores and test anxiety. In similar research, Alpert and Hab'er (1960) 
found that both grade point averages (GPA) and examination scores are 
predicted by test anxiety, and I. G. Sarason (1963) showed that' stand- 
ardized test scores in mathematics and verbal skills are also predicted 

0 

by test anxiety. 

Efforts to define the antecedent causes- of test anxiety have led 
to various interpretational theories. Nicholls (197$) defined test anx-^ 
iety as "self-evaluation," stating that test anxiety scores actually re- 
fleet students' perceptions of their own inadequacies in testing situations 
Gaudry (1977) supported a similar theory, proposing that test anxiety is 
caused by previous failures. in testing situations. Hill (1972) and Kirk- 
land and Hollatidsworth" (1980) have proposed that test anxiety is caused 
by poor test- taking skills. They independently 1 concluded that highly 
anxious childrens 1 lower test scores and lower school achievement stems 
from inadequate test- taking strategies rather than from learning defi- 
ciencies. High anxiety coupled with poor test-taking skills interfere 
with the effective completion of tests. 



Several - treatments have been designed- to. reduce test anxi ety in an 

effort tor increase academic achievement. Golfried, Linehan, .and Smith's 

(1978) use of cognitive restructuring techniques reduced test anxiety and 

raised test scores. Similarly, Miechenbaum (1972) increased test scores* 

. * , "" & ... ■ .. . 

through cognitive modification techniques which familiarized subjects with 

their anxieties and then offered them systematic desensitization treatments 
or ideal models to follow. Williams and Hill (Note 2) reduced test anxiety 
and increased test scores of high anxiety students by modifying test in- 
structions. Changing instructions so that the testing situation appeared 
less evaluative and threatening increased the subjects' scores significantly 
The altered instructions, however, caused a decrease in test scores for 
middle and low anxiety subjects. 

The critical issue remains whether or not test anxiety can be effect- 
ively reduced, and test scores subsequently raised, without any negative 
effects such as lowered scores for normally or low anxio^us students and 
without resorting to costl/ treatments or special programs. The question 
arises as to whether altering the form iff a test, and not merely the in- 
structions, would have these desired effects on anxiety. In short, is 
there a less threatening format for which most students have effective 
test-taking strategies and which will efficiently assess students' knowl- 
edge of a given subject area? •' . 

An, informal -questionnaire was administered to 150 students between 
the ages of 8 and 26. The inquiry asked for free responses to only one 
question: "Which type of test or test question makes you worry the least?" 



Responses included oral exams, essays, fill-in-the blank completion ques-\ 
tions, and others, One predominant response, however, was matching tests. 
.Traditionally, Ina^hing test formats have, been avoided in favor of multipl 
choice items forf various reasons, including item analysis properties and 
.chance performancVcharacteri sties (Popham, 1981; Shaha, Note 3). Howevc 
if a matching test could assess knowledge for a given topic area as effec- 
tively a\ an analogous multiple-choice test,#et present a significantly 

• less, threatening, *1 ess anxiety-provoking situation, then the matching for- 
' mat should be utilized instead of the traditional, alternative. It was 

upon thi*s logic that the following research was conducted. 

Two experiments were designed to, measure the comparative effective- 
ness of analogous multiple-choice and matching tests fdr (1) assessing"- 
student recall capabilities, and (2) for reducing test anxiety. It was 
\ anticipated that the matching test format would represent "a significantly 

* less anxiety-producing stimulus^nd yet be equally effective for-measur- 
ing subject recall. Measurement effectiveness was determined to be rep- 
resentable by item discrimination and. difficuUy. 
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' ■ ' EXPERIMENT I: j METHOD 

' ■ • . ft 

Subjects and Design . - ! £ 

Sixty-four juniors and seniors from West Los Angeles area high schooTs 

participated in three classroom groups (19, 22, 23 students) as voluntary 

* 

subjects. . . . •'. 

Materials and Tasks . 

Twelve premise/ response pairs were composed dealing wit/i facts (prem- 
ises) about past Presidents of the United States (responses). All pairs 
related to one common stem: "Which of the^following Presidents listed 
would you associate with the statement(s) given?" A matching test was 
constructed ising the 12 test. paifs. Premises were listed vertically on 

the left side of the test sheet,>a1id responses on the right side. Three 

t< ■ 
extra President's' names were- added to the response list as distractors. 

Premises were numbered and responses lettered, and all were randomly 

ordered. A blank space was provided next 'to each premise for recording 

the letter corresponding to the selected response. 

• ".. . - 'V 

Twelve multiple-choice test items, drawing upon the response\al ter- 

natives/and the additional three distractors described above, were\ con- 
structed. Each item had the same basic stem, one of each of the 12* prem- 
ises as the questions, and four alternative response choices. Each 1 of 
the 12 responses was used as an alternative three times, and the three 
additional distractors were used four times each. The completed test > 
was presented in a three-page booklet. 



Test anxiety was assessed via a posttest questionnaire based on the 
' % 
Inventory of Tesj^nxiety (Osterhouse, 1972). The resulting experimental 

" * , .1 . 

measure', hereafter referred to as the Test Anxiety Inventory (TAI), con- 
sisted of 16 likert- scaled items and was designed to measure anxiety' felt 
during the test as reported in retrospect. r 

Test preference was assessed by a three-item questionniare with the ^ 
following questions: (1) "Which. of the two test forms did you prefer?" 
(2) ""Which test was easiest for you?", and (3) "Which test was faster?' 1 

ATI materials and tasks described above were reproduced by photo- 

* - - • 

copying on 8% x 11 inch (21 x 27% cm) standard sized paper. Each of the 
tasks was stapled into separate booklets -preceded by an instruction sheet. 
Procedures 

Subjects participated in their reguTar classroom groups in a design 
counterbalanced for test sequence. ^ Tests were distributed* from a randomly 
shuffled stack consisting of half of each test type. After subjects read 
instructions for their tests silently and all procedural questions were- 
answered individually, all subjects initiated the test tasks simultaneously. 
Upon completion of the first test, an experimenter equipped with a stopwatch 
collected the test, recorded the time-on-task (to the nearest 30 seconds), ■ 
and gave the student a TAI for completion. After compl eti the TAI, the 
subject was distributed the second, opposing test form. Completion of the 
second test was also followed by a second TAI. The final subject task was 
to respond to the test preference questionnaire, after which students were 
dismissed from the room. 



> No tirfle limits were imposed for any experimental tasks. However, 
as mentioned, time-on- ta^s-k was monitored by-recording starting and finish- 
ing* times for each of the two tests. 

| 

R esults and Discussion 

. .. • •> 

A one-way analysis of variance for time-on-task revealed no differ- . 

ences between groups. Both test forms were scored for number of correct 
responses. A two-way analysis of variance for test scores v test se- 
quence revealed no significant differences between scores for either test 
format, and. no significant main effect for test sequence counterbalancing. 
There was also no significant" interaction. 

o 

/ 

Jnsert Table 1 here 

In order to compare assessment -effectiveness between the traditional 
multi" ie-choice format and the matching test alternative, detailed anal- 
ysis of item difficulty ratings and discrimination properties -were per- 
formed. Item discrimination refers to the consistency with which high 
scoring subjects respond correctly to an item<while low scoring students 
prr, meaning that the test truly discriminates between those who possess 
and those who do not possess requisite knowledge: 

An item analysis was completed for each test. -Item difficulty was 
calculated as the proportion of responding students who^ scored incorrectly 
(high proportion=high difficulty). Item discrimination was calculated 
as the point-bi serial correlation of correct/ incorrect response patterns 
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Table 1 

Meaqs and Standard Deviations: 

Experiment I . «\ 



Time-oh-Task 



Number of Correct 
Responses 



Item Difficulty 



Item Discrimin- 
ation 



Test Anxiety 





rid uln i ny i cb i , 


Mill nl p— Phni pp 


rlcan 


U • DO 


5 32 


SD .! 


1.43 . 


.99 


Mean 


9.62 


•L 9-38 








SD 


2.53 


3.10 


Mean 


.32 


.38 


SD 


.18 


.21 


Mean 


.73** 


.53 


sb, 


.23 


-.11 


Mean • 


2.33 


4.07** 


SD 


.49 


.1.12 



** Significantly greater at k p=.01 



with total test scores. Analysis of variance for item difficulty yielded 
no significant differences between groups, meaning that neither tesfrwas 
significantly harder or easier for the students. Analysis of variance 

4 

for item discrimination, on the other hand, yielded a significant £ ratio • 

in favor'of the matching test (E(l,62)=9.41, MSerr = .21,j><.0r). TTI V. 

y 

other words, the matching test more' accurately discriminated between high^ " 
and low scorers (see Table lL where high scoring subjects were more con- 
sistently correct on tfi^ matching test. 

• The Mker.$/scale responses to test anxiety questionnaires were re- 
duced to a mean anxiety score for each subject, on each test. Analysis 
of variance for anxiety showed that the matching* test produced significantly 
less test anxiety than the multiple-choice test (see Table'l). Tneim- 
plication is that the matching test format presented a significantly less 
threatening situation and hence produced significantly less test anxiety. . 
The test preference questionnaires, in which 63% of the subjects stated 
that they preferred the matching test format, supported the conclusion 
that'the matching test is less threatening. The majority of the respond- 
entsalso claimed the matching test was both easier (83%) and-nook less 
time to complete (53%). The claims by subjects concerning the comparative 
time taken to complete the tests were especial ly, interesting in view of _ 
the fact that no significant differences were found for actual time-on-task. 
This phenomenon was previously discovered and discussed by I. G. Sarason 
and Stoops (1978). 

Considered as a whole, the results of Experiment I support the as- 
sertion that matching testr offer a significantly less anxiety-producing 



format; as evidenced by anxiety scores and preference reports, Farther, 
as indicated by the significantly higher item discrimination indices for 
the matching test items, the reduction in anxiety does not reduce test 
effectiveness for discriminating between subjects familiar with topic 
material and those with less knowledge. 

EXPERIMENT II: METHOD 

The tests in Experiment I were designed to assess subjects' ability 
to respond to questions based on prior knowledge. A second experiment 
was conducted to determine whether the results of the first experiment 
were general izable to tests covering material either novel to or just en- 
coded by the subjecl^^^^^^^ 
Subjects and Design ^^-^^^ 

The same 64 high school juniors and serrToi^s^fjrom Los 'Angeles area 
schools participated in the identical classroom groups one week later.. 
- Materials and Tasks * >^ 

Following the same procedures used in the first experiment, tests 
covering information about two topics were constructed: (1) Whales, 
and' (2) Far Eastern Religions. Twelve premise/response pairs were -de- , 
vel oped for each topic and then converted into analogous matching and 
multiple-choice 6 tests.. For encoding purposes, prose passages were then 
composed based on the questions, and the passages were taped on cassettes. 

Test anxiety, test format preference,, and time-on-task were all 
measured by devices' identical to those used in the first experiment. 



Procedures * : %i 

. Data were collected pn separate, consecutive days for pach of the 
two topics. As in Experiment I, subjects completed the" experimental 
tasks in the following sequence: (1) Test format #1 (format determined 
by random distribution procedures), (2) TAI #1, (3) Test format #2, (4) 
TAI #2, and (5) Test preference questionnaire. 

On the first day, subjects listened to the taped passage about Whales 
(3 min. duration) while they read the identical passage silently. This . 
procedure^was designed to maximize encoding. Instructions for the en- 
coding task warned subjects that they would be tested for their memory 
of the stimulus information, but no reference was made to the mode or 
manner of testing. The remaining experimental tasks were performed with- 
out any further exposure to the stimulus material. The same procedure 
was employed on the second day with tfae tape (3.5 min. duration) and 
passage about Far Eastern Religions. 

0 

Results and Discussion 

\ Scoring procedures were identical to those employed in the first 
experiment. Analysis of variance for time-on-task yielded no significant 
difference for either test format, despite topic matter (see Table 2). 
A two-way analysis of variance for each topic area yielded no significant 
effects for number of correct responses, for test sequence effects (counter 
balancing), or for the interaction. 

Item analysis were conducted for all four tests. The tests measur- 
ing recall of Whales revealed no significant differences for item diffi- 
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culty or for item discrimination statistics, /rhe test covering Far Eastern 
religions showed no significant differences/for item difficulty, but did 
produce a significant F ratio by analyses of variance between test types 
for time discrimination (F(l,62)=5.63; MSerr=.13; P<.05)/ It appears that 
the matching tests were at least ; as. effective assessment tools as the mlii- 
tiple-choice formats. 



Insert Table 2 here^ 

... 

'Test anxiety data from both topics assessed mirrored results from , 
Experiment I. Matching test formats were significantly less anxiety- . _ f 
producing for both Whales (F(U62>6.21; MSerr= .17) and Far Eastern Re- 
ligions topics (F(l,62)=10.03; MSerr= .33). Test preference was also 
decidedly in favor of the matching formats. Actual percentages of sub- 
jects stating a preference for the matching test mode were 79% for Whale-; 
and an overwhelming 93% for Far Eastern Religions. Questionnaires again 
consistently echoed the findings' that subjects perceived a shorter time- 
on-task for the matching test formats (68%, 54% respectively), even though 
analyses for time-on- task revealed no significant differences between ' ; 
formats. Subjects also rated the .matching test as easier (73%, 91%), 
while no significant differences for' scores were found. . 

CONCLUSIONS 

The two experiments considered together clearly support the uae of 
matching test formats for assessing either prior knowledge or recall of 
recently encoded material. Although test developers and theorists may 
debate use of the matching test format (Shatia, Note 3; Burry, 1971) 



Table 2 

Means and Standard Deviations: 
Experiment II 



Topic 
Task 



Whales 



Far East Religions 



Tinie-on-Task 



Number of Correct 
Responses 



Item Difficulty 



Item Discrimination 



Test Anxiety 



Matching Test Multiple-Choice Test 



Matching Test . Multiple-Choice Test 



Mean 
SD 

Mean 
SD 

Mean 
SD 

Mean 

■SD 

Mean 
SD 



4.85 
.79 

10". 03 
3.11 

.45 
.42 



.11 ' 

2.06 
1.31 



4.96 
.55 

9.85 
3.69 

.49 
.27 

-.44 
.32 

3.81** 
.62 



7.34 

2*39 

7.38. 
4.21 

.67 , 
.41 

.78* 
.44 

2.89 
2.03 



6.86 
1.88 

6.33 
3.28 

.65 
.28 

;68 • 
.12 



4.42** 
41 



* Significantly at £=.05 
** Significantly at £=.01 



1 ^ 
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• these experiments suggest that multiple-choice tests should -not neces- . 
sarily be preferred for either assessment effectiveness or anxiety re- 
duction when contrasted with the matching format. On the contrary, sub- 
jects overwhelmingly and consistently favored the use of matching tests, 
scored equally high on them, and experienced significantly less debilitat- 
ing test anxiety. • - * ; 

Perhaps the most interesting finding in these studies involves the 
reduction of test anxiety, without any apparent ill effects, merely by 
changing test format. This findi ng cannot be overemphasized. The cor- 
relation between anxiety and both test performance and scholastic achieve- 
ment in general raises major concerns about the use of. any assessment, 
technique whicK might unnecessarily increase anxiety and decrease test 
performance.- f . 

One possible explanation for the reduction in test anxiety discovered, 
in these studies lies in successful test-taking strategies and positive 
self-evaluations. Shaha (Note 3) found; that subjects -employ simple el im- 
i nation strategies when responding to matching test items; the easier 
matches are made first, and made quickly, and easily. The subject is im- 
mediately reinforced, and his/her confidence increases' as the elimination 
strategies are found to be successful. Although the simple matches are" 
expended as the student proceeds and encounters more difficult associ- 
ations, the initial optimism does not wear-off, as is evidenced by post- 

♦ 

experimental test preferences and post-test anxiety scores. 

In summary, the "self-aval uation M theories discussed earlier are 

* ' * " 
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supported here (e.g., Gaudry, 1977; Nicholls, 1976)as well as the "test- 
taking skill" proposals (e.g., Hill, 1972; Kirkland, 1980). On the basis 
of thei.r effective elimination strategies, students fee"! reduced anxiety 
and increased competence. However, the optimism and subsequent reduced 
anxiety are a student perception. Test scores, time-on-task, and item 
difficulty data discount any actual superiority of tjie matching test for 
ease or efficiency of strategy. 

Reduction of test anxiety cannot be overemphasized. Sir\ce test anxiety 
predicts both test scores and scholastic achievement in general, any assess- 
ment technique, which might unnecessarily increase anxiety should be avoided. 
If a particular test format can lower anxiety and yield outcome scores and 
assessment data equivalent to those obtained with other formats, then the 
anxiety- reducing method should be employed. Certainly further research 
by test' developers is in order. 
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